library(dplyr)
library(survival)
library(mice)
library(MatchThem)
library(survey)
library(here)
library(gtsummary)
library(parallelly)
library(ranger)
library(furrr)
library(cobalt)
library(gsDesign)
source(here("functions", "source_encore.io_functions.R"))
# track time
runtime <- tictoc::tic()3 Application in Cox PH models
Comparison of coxph versus svycoxph after multiple imputation and propensity score matching
In Chapter 3 we illustrate a reproducible example on how to use coxph (survival package (Therneau 2024)) and svycoxph (survey package (Lumley 2024)) in combination with multiple imputation by chained equations (mice package (Buuren and Groothuis-Oudshoorn 2011)) and propensity score matching using the MatchThem package (Pishgar et al. 2021).
First, we load the required R libraries/packages and some custom functions that are part of the encore.io R package that is being developed to streamline the analysis of all ENCORE trial emulations (non-public package).
3.1 Data generation
We use the simulate_flaura() function to simulate a realistic oncology comparative effectiveness analytic cohort dataset with similar distributions to FLAURA, a randomized controlled trial that evaluated the efficacy and safety of osimertinib to standard-of-care (SoC) tyrosine kinase inhibitors (TKIs) in advanced NSCLC patients with a sensitizing EGFR mutation.
The following cohort resembles distributions observed in the EHR-derived EDB1dataset used in ENCORE. Note: the values of some continuous covariates (labs) are displayed after log/log-log transformation.
# load example dataset with missing observations
data_miss <- simulate_flaura(
n_total = 3500,
seed = 42,
include_id = FALSE,
imposeNA = TRUE,
propNA = .33
)
# crate Table 1
data_miss |>
tbl_summary(
by = "treat",
include = table1_covariates$covariate,
label = table1_labels
) |>
add_overall() |>
modify_header(
label ~ "**Patient characteristic**",
stat_0 ~ "**Total** <br> N = {N}",
stat_1 ~ "**Comparator** <br> N = {n} <br> ({style_percent(p, digits=1)}%)",
stat_2 ~ "**Exposure** <br> N = {n} <br> ({style_percent(p, digits=1)}%)"
) |>
modify_spanning_header(c("stat_1", "stat_2") ~ "**Treatment received**") |>
modify_caption("**Table 1. Patient Characteristics**")Patient characteristic |
Total |
Treatment received |
|
|---|---|---|---|
Comparator |
Exposure |
||
| Age at index date | 69 (64, 74) | 69 (64, 74) | 69 (64, 74) |
| Sex | 1,146 (33%) | 494 (33%) | 652 (32%) |
| Race | |||
| Asian | 835 (36%) | 347 (35%) | 488 (36%) |
| Other | 54 (2.3%) | 22 (2.2%) | 32 (2.4%) |
| White | 1,449 (62%) | 625 (63%) | 824 (61%) |
| Unknown | 1,162 | 493 | 669 |
| Smoking history | 1,033 (44%) | 482 (48%) | 551 (41%) |
| Unknown | 1,162 | 493 | 669 |
| ECOG | 1,327 (57%) | 583 (59%) | 744 (55%) |
| Unknown | 1,162 | 493 | 669 |
| Index year | |||
| <2018 | 3,324 (95%) | 1,400 (94%) | 1,924 (96%) |
| 2018+ | 176 (5.0%) | 87 (5.9%) | 89 (4.4%) |
| 1
Median (Q1, Q3); n (%) |
|||
3.2 Step 1 - Multiple imputation
The first step after deriving the analytic cohort includes the creation of multiple imputed datasets using mice R package(Buuren and Groothuis-Oudshoorn 2011).
The
micealgorithm is one particular instance of a fully conditionally specified model. The algorithm starts with a random draw from the observed data, and imputes the incomplete data in a variable-by-variable fashion. One iteration consists of one cycle through all \(Y_j\).
The number of iterations \(M\) (= number of imputed datasets) in this example is 10, but in ENCORE we follow Stef van Buuren’s advice:
[…] if calculation is not prohibitive, we may set \(M\) to the average percentage of missing data.
Following the results of various simulation studies (Shah et al. 2014; Weberpals et al. 2024), we use a non-parametric (random forest-based) imputation approach as the actual imputation algorithm.
Parametric imputation models have to be correctly specified, i.e. also have to explicitly model nonlinear and non-additive covariate relationships
Many imputation algorithms are not prepared for mixed type of data
Popular: random forest-based algorithms
for each variable random forest is fit on the observed part and then predicts the missing part
missForest(Stekhoven and Bühlmann 2012) provides OOB error but only provides single imputations
Alternatives: rf, cart in
micepackage (Buuren and Groothuis-Oudshoorn 2011)
Note: In this example we utilize the futuremice() instead of the legacy mice() function to run the mice imputation across 9 cores in parallel.
# impute data
data_imp <- futuremice(
parallelseed = 42,
n.core = parallel::detectCores()-1,
data = data_miss,
method = "rf",
m = 10,
print = FALSE
)The imputation step creates an object of class…
class(data_imp)[1] "mids"
…which stands for multiple imputed datasets. It contains important information on the imputation procedure and the actual imputed datasets.
3.3 Step 2 - Propensity score matching and weighting
Apply propensity score matching and weighting with replacement within in each imputed dataset. As pointed in Section 2.3.1, the MIte approach performed best in terms of bias, standardized differences/balancing, coverage rate and variance estimation. In MatchThem this approach is referred to a within approach (performing matching within each dataset), while the inferior MIps approach (estimating propensity scores within each dataset, averaging them across datasets, and performing matching using the averaged propensity scores in each dataset) is referred to as across approach. Since MIte/within has been shown to have superior performance in most cases, we only illustrate this approach here.
Let’s assume we fit the following propensity score model within each imputed dataset.
# apply propensity score matching on mids object
ps_form <- as.formula(paste("treat ~", paste(covariates_for_ps, collapse = " + ")))
ps_formtreat ~ dem_age_index_cont + dem_sex_cont + c_smoking_history +
c_number_met_sites + c_hemoglobin_g_dl_cont + c_urea_nitrogen_mg_dl_cont +
c_platelets_10_9_l_cont + c_calcium_mg_dl_cont + c_glucose_mg_dl_cont +
c_lymphocyte_leukocyte_ratio_cont + c_alp_u_l_cont + c_protein_g_l_cont +
c_alt_u_l_cont + c_albumin_g_l_cont + c_bilirubin_mg_dl_cont +
c_chloride_mmol_l_cont + c_monocytes_10_9_l_cont + c_eosinophils_leukocytes_ratio_cont +
c_ldh_u_l_cont + c_hr_cont + c_sbp_cont + c_oxygen_cont +
c_ecog_cont + c_neutrophil_lymphocyte_ratio_cont + c_bmi_cont +
c_ast_alt_ratio_cont + c_stage_initial_dx_cont + dem_race +
dem_region + dem_ses + c_time_dx_to_index
The matching step happens using the matchthem() function, which is a wrapper around the matchit() function. This function not only provides the functionality to match on the propensity score, but also to perform (coarsened) exact matching, cardinality matching, genetic matching and more. In this example, we use a simple 1:1 nearest neighbor matching on the propensity score (estimated through logistic regression) without replacement with a caliper of 1% of the standard deviation of the propensity score.
# matching
mimids_data <- matchthem(
formula = ps_form,
datasets = data_imp,
approach = 'within',
method = 'nearest',
distance = "glm",
link = "logit",
caliper = 0.01,
ratio = 1,
replace = F
)
# print summary for matched dataset #1
mimids_dataA matchit object
- method: 1:1 nearest neighbor matching without replacement
- distance: Propensity score [caliper]
- estimated with logistic regression
- caliper: <distance> (0.001)
- number of obs.: 3500 (original), 2672 (matched)
- target estimand: ATT
- covariates: dem_age_index_cont, dem_sex_cont, c_smoking_history, c_number_met_sites, c_hemoglobin_g_dl_cont, c_urea_nitrogen_mg_dl_cont, c_platelets_10_9_l_cont, c_calcium_mg_dl_cont, c_glucose_mg_dl_cont, c_lymphocyte_leukocyte_ratio_cont, c_alp_u_l_cont, c_protein_g_l_cont, c_alt_u_l_cont, c_albumin_g_l_cont, c_bilirubin_mg_dl_cont, c_chloride_mmol_l_cont, c_monocytes_10_9_l_cont, c_eosinophils_leukocytes_ratio_cont, c_ldh_u_l_cont, c_hr_cont, c_sbp_cont, c_oxygen_cont, c_ecog_cont, c_neutrophil_lymphocyte_ratio_cont, c_bmi_cont, c_ast_alt_ratio_cont, c_stage_initial_dx_cont, dem_race, dem_region, dem_ses, c_time_dx_to_index
The resulting “mimids” object contains the original imputed data and the output of the calls to matchit() applied to each imputed dataset.
The weighting step is performed very similarly using the weightthem() function. In this example weapply SMR weighting to arrive at the same ATT estimand as matching which is indicated through the estimand = "ATT" argument. In case we wanted to weight patients based on overlap weights, estimand = "AT0" would need to be specified (which is one of the sensitivity analyses in the FLAURA protocol).
To mitigate the risks of extreme weights, the subsequent trim() function truncates large weights by setting all weights higher than that at a given quantile (in this example the 95% quantile) to the weight at the quantile. Since we specify lower = TRUE, this is done symmetrically also with the 5% quantile.
# SMR weighting
wimids_data <- weightthem(
formula = ps_form,
datasets = data_imp,
approach = 'within',
method = "glm",
estimand = "ATT"
)
# trim extreme weights
wimids_data <- trim(
x = wimids_data,
at = .95,
lower = TRUE
)
wimids_dataA weightit object
- method: "glm" (propensity score weighting with GLM)
- number of obs.: 3500
- sampling weights: none
- treatment: 2-category
- estimand: ATT (focal: 1)
- covariates: dem_age_index_cont, dem_sex_cont, c_smoking_history, c_number_met_sites, c_hemoglobin_g_dl_cont, c_urea_nitrogen_mg_dl_cont, c_platelets_10_9_l_cont, c_calcium_mg_dl_cont, c_glucose_mg_dl_cont, c_lymphocyte_leukocyte_ratio_cont, c_alp_u_l_cont, c_protein_g_l_cont, c_alt_u_l_cont, c_albumin_g_l_cont, c_bilirubin_mg_dl_cont, c_chloride_mmol_l_cont, c_monocytes_10_9_l_cont, c_eosinophils_leukocytes_ratio_cont, c_ldh_u_l_cont, c_hr_cont, c_sbp_cont, c_oxygen_cont, c_ecog_cont, c_neutrophil_lymphocyte_ratio_cont, c_bmi_cont, c_ast_alt_ratio_cont, c_stage_initial_dx_cont, dem_race, dem_region, dem_ses, c_time_dx_to_index
- weights trimmed at 5% and 95%
The resulting “wimids” object contains the original imputed data and the output of the calls to weightit() applied to each imputed dataset.
3.4 Step 3 - Balance assessment
The inspection of balance assessment in multiple imputed and matched/weighted data can be done in a similar way as with a single complete dataset. For illustration we just look at the matched datasets, but the exact same principles also apply to the weighted datasets.
# create balance table
balance_table <- bal.tab(
x = mimids_data,
stats = "m",
abs = TRUE
)
balance_tableBalance summary across all imputations
Type Mean.Diff.Adj Max.Diff.Adj
distance Distance 0.0052 0.0056
dem_age_index_cont Contin. 0.0164 0.0431
dem_sex_cont Binary 0.0038 0.0095
c_smoking_history Binary 0.0071 0.0135
c_number_met_sites Contin. 0.0095 0.0243
c_hemoglobin_g_dl_cont Contin. 0.0080 0.0196
c_urea_nitrogen_mg_dl_cont Contin. 0.0144 0.0405
c_platelets_10_9_l_cont Contin. 0.0123 0.0367
c_calcium_mg_dl_cont Contin. 0.0146 0.0287
c_glucose_mg_dl_cont Contin. 0.0138 0.0304
c_lymphocyte_leukocyte_ratio_cont Contin. 0.0115 0.0262
c_alp_u_l_cont Contin. 0.0143 0.0250
c_protein_g_l_cont Contin. 0.0144 0.0259
c_alt_u_l_cont Contin. 0.0123 0.0298
c_albumin_g_l_cont Contin. 0.0148 0.0418
c_bilirubin_mg_dl_cont Contin. 0.0213 0.0347
c_chloride_mmol_l_cont Contin. 0.0147 0.0328
c_monocytes_10_9_l_cont Contin. 0.0154 0.0284
c_eosinophils_leukocytes_ratio_cont Contin. 0.0156 0.0216
c_ldh_u_l_cont Contin. 0.0139 0.0284
c_hr_cont Contin. 0.0124 0.0335
c_sbp_cont Contin. 0.0118 0.0263
c_oxygen_cont Contin. 0.0143 0.0348
c_ecog_cont Binary 0.0054 0.0134
c_neutrophil_lymphocyte_ratio_cont Contin. 0.0149 0.0270
c_bmi_cont Contin. 0.0153 0.0392
c_ast_alt_ratio_cont Contin. 0.0179 0.0427
c_stage_initial_dx_cont Contin. 0.0105 0.0316
dem_race_Asian Binary 0.0060 0.0174
dem_race_Other Binary 0.0011 0.0030
dem_race_White Binary 0.0070 0.0166
dem_region_Midwest Binary 0.0040 0.0090
dem_region_Northeast Binary 0.0037 0.0118
dem_region_South Binary 0.0042 0.0150
dem_region_West Binary 0.0049 0.0089
dem_ses Contin. 0.0113 0.0370
c_time_dx_to_index Contin. 0.0095 0.0245
Average sample sizes across imputations
0 1
All 1487. 2013.
Matched 1349.8 1349.8
Unmatched 137.2 663.2
love.plot(
x = mimids_data,
abs = TRUE,
thresholds = 0.1,
drop.distance = TRUE,
var.order = "unadjusted",
colors = c("orange", "blue"),
stars = "std",
shapes = 17,
size = 4,
grid = TRUE,
position = "top"
)bal.plot(
x = mimids_data,
var.name = "distance",
which = "both",
which.imp = .none,
colors = c("orange", "blue")
)For power calculations, we use the method proposed by Schoenfeld (Schoenfeld 1983) to compute 1 - type II error rate \(\beta\) . For this, we assume the following:
\(\alpha\) = 0.05 (two-sided)
% exposed (1:1 matching in main analysis) = 50%
HR (desired) = 0.8
Events = as observed in data
Since we have multiple imputed and matched datasets, we need to average the number of events before before computing \(\beta\).
# make long dataset
data_long <- MatchThem::complete(
# datasets
data = mimids_data,
# produces a dataset with multiply imputed datasets stacked vertically
action = "long",
# do NOT include observations with a zero estimated weight (non-matched)
all = FALSE,
# do NOT include original dataset with missing values
include = FALSE
)
# compute average number of events
# by summing up all events
# and dividing by number of imputed datasets
avg_events <- sum(data_long$death_itt)/mimids_data$object$m
# compute beta
beta_gsDesign <- nEvents(
alpha = 0.05,
sided = 2,
n = avg_events,
hr = .8,
ratio = 1,
tbl = TRUE
)
# print results
cat("beta is", beta_gsDesign$beta, "\n")beta is 7.069689e-05
cat("power is", (1-beta_gsDesign$beta)*100, "% \n")power is 99.99293 %
# gsDesign table
beta_gsDesign hr n alpha sided beta Power delta ratio hr0 se
1 0.8 2670.5 0.05 2 7.069689e-05 0.9999293 0.1115718 1 1 0.03870203
3.5 Step 4 - Estimation of marginal treatment effects
Next, we compare the marginal treatment effect estimates coming from a Cox proportional hazards model after propensity score matching and weighting as implemented in the coxph() and in the svycoxph() functions.
From the MatchThem documentation:
with()applies the supplied model inexprto the (matched or weighted) multiply imputed datasets, automatically incorporating the (matching) weights when possible. The argument toexprshould be of the formglm(y ~ z, family = quasibinomial), for example, excluding the data or weights argument, which are automatically supplied.Functions from the survey package, such as
svyglm(), are treated a bit differently. Nosvydesignobject needs to be supplied becausewith()automatically constructs and supplies it with the imputed dataset and estimated weights. Whencluster = TRUE(orwith()detects that pairs should be clustered; see theclusterargument above), pair membership is supplied to theidsargument ofsvydesign().After weighting using
weightthem(),glm_weightit()should be used as the modeling function to fit generalized linear models. It correctly produces robust standard errors that account for estimation of the weights, if possible. SeeWeightIt::glm_weightit()for details. Otherwise,svyglm()should be used rather thanglm()in order to correctly compute standard errors.For Cox models,
coxph()will produce approximately correct standard errors when used with weighting, butsvycoxph()will produce more accurate standard errors when matching is used.
We now want to compare treatment effect estimates for treat when computed (a) using coxph (survival package) and (b) svycoxph (survey package). More information on estimating treatment effects after matching is provided in https://kosukeimai.github.io/MatchIt/articles/estimating-effects.html#survival-outcomes
3.5.0.1 coxph
# coxph result
coxph_results <- with(
data = mimids_data,
expr = coxph(formula = Surv(fu_itt_months, death_itt) ~ treat,
weights = weights,
cluster = subclass,
robust = TRUE
)
) |>
pool() |>
tidy(exponentiate = TRUE, conf.int = TRUE) |>
mutate(package = "survival") |>
select(package, term, estimate, std.error, conf.low, conf.high)
coxph_results3.5.0.2 svycoxph
# svycoxph result
svycoxph_results <- with(
data = mimids_data,
expr = svycoxph(formula = Surv(fu_itt_months, death_itt) ~ treat),
cluster = TRUE
) |>
pool() |>
tidy(exponentiate = TRUE, conf.int = TRUE) |>
mutate(package = "survey") |>
select(package, term, estimate, std.error, conf.low, conf.high)
svycoxph_results3.5.0.3 Summary
rbind(coxph_results, svycoxph_results) package term estimate std.error conf.low conf.high
1 survival treat 0.6980895 0.04378617 0.6402933 0.7611027
2 survey treat 0.6980895 0.04379887 0.6403105 0.7610823
3.5.0.4 coxph
# coxph result
coxph_results <- with(
data = wimids_data,
expr = coxph(formula = Surv(fu_itt_months, death_itt) ~ treat,
weights = weights,
robust = TRUE
)
) |>
pool() |>
tidy(exponentiate = TRUE, conf.int = TRUE) |>
mutate(package = "survival") |>
select(package, term, estimate, std.error, conf.low, conf.high)
coxph_results3.5.0.5 svycoxph
# svycoxph result
svycoxph_results <- with(
data = wimids_data,
expr = svycoxph(formula = Surv(fu_itt_months, death_itt) ~ treat),
cluster = TRUE
) |>
pool() |>
tidy(exponentiate = TRUE, conf.int = TRUE) |>
mutate(package = "survey") |>
select(package, term, estimate, std.error, conf.low, conf.high)
svycoxph_results3.5.0.6 Summary
rbind(coxph_results, svycoxph_results) package term estimate std.error conf.low conf.high
1 survival treat 0.7031986 0.03545543 0.6559682 0.7538297
2 survey treat 0.7031986 0.03546033 0.6559784 0.7538180
3.6 Session info
Script runtime: 0.37 minutes.
pander::pander(subset(data.frame(sessioninfo::package_info()), attached==TRUE, c(package, loadedversion)))| package | loadedversion | |
|---|---|---|
| cobalt | cobalt | 4.5.5 |
| dplyr | dplyr | 1.1.4 |
| furrr | furrr | 0.3.1 |
| future | future | 1.34.0 |
| gsDesign | gsDesign | 3.6.4 |
| gtsummary | gtsummary | 2.0.1 |
| here | here | 1.0.1 |
| MatchThem | MatchThem | 1.2.1 |
| Matrix | Matrix | 1.7-0 |
| mice | mice | 3.16.0 |
| parallelly | parallelly | 1.38.0 |
| ranger | ranger | 0.16.0 |
| survey | survey | 4.4-2 |
| survival | survival | 3.5-8 |
pander::pander(sessionInfo())R version 4.4.0 (2024-04-24)
Platform: aarch64-apple-darwin20
locale: en_US.UTF-8||en_US.UTF-8||en_US.UTF-8||C||en_US.UTF-8||en_US.UTF-8
attached base packages: grid, stats, graphics, grDevices, datasets, utils, methods and base
other attached packages: gsDesign(v.3.6.4), cobalt(v.4.5.5), furrr(v.0.3.1), future(v.1.34.0), ranger(v.0.16.0), parallelly(v.1.38.0), gtsummary(v.2.0.1), here(v.1.0.1), survey(v.4.4-2), Matrix(v.1.7-0), MatchThem(v.1.2.1), mice(v.3.16.0), survival(v.3.5-8) and dplyr(v.1.1.4)
loaded via a namespace (and not attached): tidyselect(v.1.2.1), farver(v.2.1.2), fastmap(v.1.2.0), digest(v.0.6.37), rpart(v.4.1.23), lifecycle(v.1.0.4), magrittr(v.2.0.3), compiler(v.4.4.0), rlang(v.1.1.4), sass(v.0.4.9), tools(v.4.4.0), utf8(v.1.2.4), yaml(v.2.3.10), gt(v.0.11.0), knitr(v.1.48), labeling(v.0.4.3), htmlwidgets(v.1.6.4), xml2(v.1.3.6), r2rtf(v.1.1.1), withr(v.3.0.1), purrr(v.1.0.2), nnet(v.7.3-19), fansi(v.1.0.6), jomo(v.2.7-6), xtable(v.1.8-4), colorspace(v.2.1-1), ggplot2(v.3.5.1), globals(v.0.16.3), scales(v.1.3.0), iterators(v.1.0.14), MASS(v.7.3-60.2), cli(v.3.6.3), rmarkdown(v.2.28), crayon(v.1.5.3), generics(v.0.1.3), rstudioapi(v.0.16.0), sessioninfo(v.1.2.2), commonmark(v.1.9.1), minqa(v.1.2.8), DBI(v.1.2.3), pander(v.0.6.5), stringr(v.1.5.1), splines(v.4.4.0), assertthat(v.0.2.1), parallel(v.4.4.0), base64enc(v.0.1-3), mitools(v.2.4), vctrs(v.0.6.5), WeightIt(v.1.3.0), boot(v.1.3-30), glmnet(v.4.1-8), jsonlite(v.1.8.8), mitml(v.0.4-5), listenv(v.0.9.1), locfit(v.1.5-9.10), foreach(v.1.5.2), tidyr(v.1.3.1), glue(v.1.7.0), nloptr(v.2.1.1), pan(v.1.9), chk(v.0.9.2), codetools(v.0.2-20), stringi(v.1.8.4), shape(v.1.4.6.1), gtable(v.0.3.5), lme4(v.1.1-35.5), munsell(v.0.5.1), tibble(v.3.2.1), pillar(v.1.9.0), htmltools(v.0.5.8.1), R6(v.2.5.1), rprojroot(v.2.0.4), evaluate(v.0.24.0), lattice(v.0.22-6), markdown(v.1.13), backports(v.1.5.0), cards(v.0.2.1), tictoc(v.1.2.1), MatchIt(v.4.5.5), broom(v.1.0.6), renv(v.1.0.7), simsurv(v.1.0.0), Rcpp(v.1.0.13), nlme(v.3.1-164), xfun(v.0.47) and pkgconfig(v.2.0.3)
pander::pander(options('repos'))repos:
REPO_NAME https://packagemanager.posit.co/cran/latest